Identifying Multidocument Relations

نویسندگان

  • Erick Galani Maziero
  • Maria Lucía del Rosario Castro Jorge
  • Thiago Alexandre Salgueiro Pardo
چکیده

The digital world generates an incredible accumulation of information. This results in redundant, complementary, and contradictory information, which may be produced by several sources. Applications as multidocument summarization and question answering are committed to handling this information and require the identification of relations among the various texts in order to accomplish their tasks. In this paper we first describe an effort to create and annotate a corpus of news texts with multidocument relations from the Crossdocument Structure Theory (CST) and then present a machine learning experiment for the automatic identification of some of these relations. We show that our results for both tasks are satisfactory.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating the UMLS into an RDF-Based Biomedical Knowledge Repository.

As part of Advanced Library Services project at the National Library of Medicine, we are creating a very large Biomedical Knowledge Repository (BKR), which serves as background knowledge for applications including knowledge discovery and multidocument summarization [1]. The BKR integrates relations extracted from the biomedical literature (e.g., Medline citations) and from structured knowledge ...

متن کامل

Towards Multidocument Summarization by Reformulation: Progress and Prospects

By synthesizing information common to retrieved documents, multi-document summarization can help users of information retrieval systems to find relevant documents with a minimal amount of reading. We are developing a multidocument summarization system to automatically generate a concise summary by identifying and synthesizing similarities across a set of related documents. Our approach is uniqu...

متن کامل

Multi-Document Discourse Parsing Using Traditional and Hierarchical Machine Learning

Multi-document handling is essential today, when many documents on the same topic are produced, especially considering the Web. Both readers and computer applications can benefit from a discourse analysis of this multidocument content, since it demonstrates clearly the relations among portions of these documents. This work aims to identify such relations automatically using machine learning tec...

متن کامل

Paraphrasing and Translation

Usefulness of paraphrases • Paraphrases are alternative ways of conveying the same information • Useful in NLP application such as: – Generation producing paraphrases allows for the creation of more varied and fluent text – Multidocument summarization identifying paraphrases allows information repeated across documents to be condensed – Question answering paraphrasing is important when going be...

متن کامل

Machine and Human Performance for Single and Multidocument Summarization

coherency—and be able to draw the “best” information from a set of documents. Automatic single-document text summarization1 has been an active research area since the 1950s, with a renaissance of approaches since the 1990s. Human single-document summarization is well defined when guidelines and recommendations drive performance.2,3 System-generated single-document summaries, while not always ma...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010